Human Genomics — Latest Matching Preprints

1

A Foundational Exome Resource for Jordan: Dual Ancestry Admixture and Population-Specific Variants to Improve Clinical Variant Interpretation

Froukh, T.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.23.26353895 medRxiv

Top 0.1%

6.4%

Show abstract

Currently, the genetic architecture of Middle Eastern populations is underrepresented in global genomic databases. This gap increases the rate of Variants of Uncertain Significance (VUSs) and clinical misinterpretations of genomic data especially in Middle Eastern populations. Whole exome sequencing was conducted on 90 healthy individuals from Jordan and the data were analysed using Principal Component Analysis (PCA) and multi-computational filtering. PCA revealed a double ancestry (EUR-AFR) admixture rather than a triple admixture (EUR-AFR-AMR). More than 3,500 populations-specific variants (PSVs) were identified, of which 72% were singletons. Additionally, 19 variants were significantly enriched compared to the maximum allele frequencies in public global databases (Fisher's exact test with Benjamini-Hochberg false discovery rate correction, p-value < 0.05). Consequently, the results suggest the reclassification of variants of Uncertain Significance (VUS) which reside in the ECE2 gene to likely benign and the variants of Conflicting Classification of Pathogenicity in the genes IL1RN and THPO to benign based on the significant allele frequency (AF=0.0389, p-value < 0.05). Furthermore, a pathogenic ClinVar variant was identified in a healthy individual, warranting careful interpretation. The findings underscore the importance of identifying PSVs in order to minimize or even prevent clinical misdiagnosis and highlight the unique genetic signature in Jordan. The study serves as a foundational resource for precision medicine in the region.

2

Evo 2 Predicts Cardiomyopathy-Associated Variants and Elucidates Their Underlying Mechanisms

kurozumi, a.; otsuka, n.; Masamichi, I.; kawakami, t.; Isagawa, T.; kodera, s.; takeda, n.

2026-05-17 genomics 10.64898/2026.05.15.725304 medRxiv

Top 0.1%

4.4%

Show abstract

BackgroundAlthough advances in next-generation sequencing have accelerated the identification of genetic variants in cardiomyopathy, interpreting variants of uncertain significance (VUS) remains a clinical challenge. Evo 2 is a high-resolution genomic artificial intelligence model capable of predicting pathogenicity across large sequence contexts and enabling mechanistic interpretation; however, its application in cardiovascular genetics is limited. Here, we evaluated the utility of Evo 2 for assessing the pathogenicity and underlying mechanisms of cardiomyopathy-associated variants. MethodsWe used Evo 2 to predict the pathogenicity of single-nucleotide variants in cardiomyopathy-related genes listed on ClinVar. We assessed the ability of the model to identify characteristic structural features in both coding and noncoding regions using internal representation such as embeddings, and to infer the molecular mechanisms of variants within these regions. ResultsEvo 2 demonstrated high predictive accuracy for pathogenicity, achieving an AUROC of 0.983 and an AUPRC of 0.915. Notably, sparse autoencoders (SAEs) from embeddings identified features corresponding to higher-order structural features, including coiled-coil and actin-binding domains characteristic of cardiomyopathy-related proteins, and accurately detected mutations known to disrupt these domains. The model recognized the binding motif of the cardiac-enriched transcription factor TBX5 with SAEs and accurately predicted a single-nucleotide polymorphism affecting TBX5 binding affinity after supervised fine-tuning. ConclusionsEvo 2 demonstrated strong performance for both predicting pathogenicity and extracting biological features of cardiomyopathy-associated variants. It may represent a powerful emerging tool for evaluating VUS in cardiovascular medicine.

3

A Denisovan-derived Alu insertion in OCA2 contributes to pigmentation diversity in present-day Melanesians

Kim, K.; Pfennig, A.; Syed, S. A.; Moskwa, N.; Oliveira, N. A. J.; Pham, Q.-M.; Hallast, P.; Yilmaz, F.; McDonough, J.; Norton, H. L.; Akey, J. M.; Lee, C.

2026-03-18 genomics 10.64898/2026.03.18.712481 medRxiv

Top 0.1%

3.9%

Show abstract

Modern humans inherited DNA from Neanderthals and Denisovans, but the contribution of introgressed structural variants (SVs) to present-day human phenotypes and adaptation remains poorly understood. Here, we used a graph-genome approach to genotype 96,277 SVs in 3,332 present-day humans and three high-coverage archaic hominin genomes, identifying 153 candidate introgressed SVs. These SVs are enriched for signatures of local adaptation compared to non-introgressed SVs (p-value = 3.04 x 10-7). Among these, we focused on a Denisovan-derived Alu insertion located in intron 18 of OCA2, a gene central to pigmentation. This introgressed Alu insertion is most frequently observed (> 60%) in Indigenous people from Bougainville Island of Melanesia, and is significantly associated with increased skin pigmentation in this region. To assess its functional impact, the Alu insertion was introduced into human induced pluripotent stem cells (iPSCs), which were subsequently differentiated into melanocytes. Melanocytes harboring the Alu insertion demonstrated elevated OCA2 expression, increased pigmentation, and higher levels of enhancer activity compared to controls. Collectively, these findings highlight introgressed SVs as a significant source of adaptive and phenotypic diversity in modern humans and implicate the Denisovan-derived Alu insertion in OCA2 in pigmentation variation among present-day Melanesian populations.

4

A pilot genome-wide association study of ischemic heart disease with co-occurring arterial hypertension in a Kazakh cohort

Skvortsova, L.; Yergali, K.; Zhaxylykova, A.; Begmanova, M.; Mansharipova, A.

2026-03-23 genetic and genomic medicine 10.64898/2026.03.19.26348868 medRxiv

Top 0.1%

3.3%

Show abstract

Genome-wide association studies (GWAS) of ischemic heart disease (IHD) remain underrepresented in Central Asian populations. We conducted a pilot GWAS of IHD with co-occurring arterial hypertension in a Kazakh cohort to identify candidate loci for future replication. A case-control GWAS was performed in 451 individuals (236 cases and 215 controls). Genotyping was conducted using the Illumina Infinium Global Screening Array-24 v3.0. Association testing was performed using a logistic regression under an additive genetic model adjusted for age, sex and the first ten principal components (PC1 - PC10). Multiple testing correction was applied using the Bonferroni adjustment. As an additional analysis, knowledge-guided GWAS (KGWAS) followed by MAGMA gene-based testing was used to prioritize candidate genes. After quality control, 345 371 variants were tested. Two loci surpassed the Bonferroni-corrected genome-wide significance threshold: rs28898595 at the UGT1A locus (effect allele C; OR = 0.33, 95% CI = 0.23 - 0.49; p = 3.01x10-8) and rs28709059 in the intron region of the ACTR3C gene (effect allele C; OR = 0.4, 95% CI = 0.29 - 0.55; p = 4.08x10-8). Several additional loci showed suggestive evidence of association. In gene-level analysis, the CSMD1 gene demonstrated a significant association signal in MAGMA consistent with the European (p = 1.16x10-11) and East Asian (p = 9.07x10-11) LD reference panels. This pilot study identifies genome-wide significant loci (UGT1A, ACTR3C genes) and supports CSMD1 gene as a prioritized candidate gene for the complex phenotype of IHD associated with co-occurring arterial hypertension in the Kazakh cohort. These findings are preliminary and require replication in larger Central Asian cohorts and further functional validation.

5

Hair follicle-derived epithelial sheet has potential in vitiligo treatment

Li, J.; Chen, J.; Ling, L.; Tan, Z. L.; Sun, T.; Lin, J.; Chen, S.; Uyama, T.; Zhang, Q.; Liu, Q.; Wu, F.; Wu, W.

2026-03-30 dermatology 10.64898/2026.03.24.26349027 medRxiv

Top 0.1%

2.1%

Show abstract

Vitiligo is an acquired pigmentary disorder of the skin and mucus membranes. Previous study has demonstrated that autologous cultured epithelial grafts (ACEG) is an effective treatment for stable vitiligo. However, extraction of full-thickness skin might result in scar formation at donor site, which have hindered the wider application of this technology, especially for patients requiring large-area transplantation. Hair follicle as a source of keratinocyte and melanocyte, could be potential source of cells for preparation of autologous cultured sheet. Through culture system optimization, we have demonstrated maintenance of undifferentiated hair follicle-derived cells in feeder-independent culture system. After expansion, the hair follicle cells were directed to differentiate into a multi-layered, epidermis-like sheet. Cell identity, viability, purity, genomic stability, and antiseptic testing for hair follicle-derived epithelial sheet (HFES) were evaluated to ensure its safety. Immunofluorescence staining showed that basal keratinocytes were the main cell type of the autologous HFES. Optimization of culture conditions leads to increased melanocyte proliferation and functionality. Transcriptomic analysis confirmed upregulation of melanosome maturation genes. The proportions of cells are also similar to composition of cells under physiological conditions. Transplantation of HFES to depigmented areas in patients with stable vitiligo results in skin repigmentation. This technology provides a novel therapeutic option for vitiligo management.

6

Clinical Predictors of Outcome in Nonsegmental Vitiligo: A Prospective Cohort Study

Kumari, L.; K, S.; Nagpal, S.; Gupta, V.; Pandey, S.; Sahni, K.; Ramam, M.; Gupta, S.

2026-05-05 dermatology 10.64898/2026.04.29.26352012 medRxiv

Top 0.1%

2.0%

Show abstract

BackgroundNon-segmental vitiligo(NSV) shows marked heterogeneity in activity, progression, and treatment response. Reliable clinical markers that predict prognosis and patient-reported outcomes are lacking. ObjectivesTo identify clinicodemographic and clinical predictors of disease extent, progression, repigmentation, treatment dependency, noticeability, and psychosocial impact in NSV. MethodsIn this prospective cohort study, 275 patients with NSV were followed for 12 months. Sixteen baseline variables, including demographic features, autoimmune history, and clinical markers (koebnerization, confetti and trichrome patterns, leukotrichia, mucosal, acral, and periorificial involvement), were recorded. Outcomes included body surface area(BSA), progression, repigmentation, treatment dependency, Vitiligo Noticeability Scale(VNS), and quality-of-life indices(VIS-22, DLQI, C-DLQI, F-VIS). Multivariable analyses and cluster analysis were performed at 6 and 12 months. ResultsMarkers of disease activity leukotrichia, trichrome and confetti lesions, koebnerization, and mucosal, acral, and periorificial involvement were strongly associated with greater BSA, poor repigmentation, higher noticeability, and treatment dependency. Leukotrichia was consistent predictor of poor repigmentation and high VNS. Family history of autoimmunity predicted progression and treatment dependency. Early-onset vitiligo showed lower disease extent but greater family-related psychosocial burden. Cluster analysis identified severe, intermediate, and mild phenotypes with distinct therapeutic responses. ConclusionsSimple clinical markers can stratify NSV patients into prognostic subgroups, enabling individualized treatment and counseling. Plain Language SummaryVitiligo behave variably in different people, some people may have slow-spreading course, while others develop widespread or persistent patches. In this study, we followed 275 people with non-segmental vitiligo for one year to find signs on the skin that could predict how the disease would behave and how it would affect daily life. We found that features such as white hair within patches (leukotrichia), speckled (confetti) or three-colored lesions (trichrome), new patches appearing after injury (koebnerization), and involvement of the lips, mouth, hands, feet were linked to more severe disease, poorer response to treatment, and greater cosmetic concern. A family history of autoimmune disease increased the risk of worsening vitiligo. Patients who developed vitiligo early in life had less skin involvement but greater emotional and family-related impact. These easily recognized signs can help doctors and patients plan treatment and set realistic expectations. Significance of the studyNon-segmental vitiligo (NSV) has a heterogeneous and unpredictable clinical course with variable progression and response to therapy. However, robust prospective data linking these markers with long-term outcomes and patient-reported measures remain limited. In our prospective cohort of 275 patients, clinical markers such as leukotrichia, trichrome and confetti lesions, koebnerization, and acral/mucosal/periorificial involvement, were strongly associated with greater disease extent, poorer repigmentation, higher treatment dependency, and increased noticeability. Leukotrichia consistently predicted poor repigmentation. Thereby, prognostic stratification can also improve patient counselling regarding expected repigmentation, treatment duration, and psychosocial burden.

7

Single-cell Landscape of T Cell Heterogeneity in Kawasaki Disease: STAT3/JAK Axis Regulates the Lineage Differentiation Bias of Th17 Cells

Song, S.; Zong, Y.; Xu, Y.; Chen, L.; Zhou, Y.; Chen, L.; Li, G.; Xiao, T.; Huang, M.

2026-03-23 bioinformatics 10.64898/2026.03.18.712795 medRxiv

Top 0.1%

1.9%

Show abstract

BackgroundKawasaki disease (KD) is a pediatric systemic vasculitis in which T-cell-mediated immune responses play a pivotal role. However, the precise dynamic evolution of T-cell subsets during disease progression remains poorly understood. MethodsSingle-cell RNA sequencing (scRNA-seq) was employed to perform high-resolution annotation of peripheral blood mononuclear cells (PBMCs) from healthy controls and KD patients, both pre- and post- IVIG treatment. T-cell developmental trajectories were reconstructed via Monocle3-based pseudotime analysis. Furthermore, the functional significance of the significant pathway was validated in a CAWS-induced KD murine model. ResultsA high-resolution single-cell landscape identified 13 distinct T-cell subtypes. Pseudotime analysis revealed a significant lineage commitment of CD4+ T cells toward a Th17 phenotype during the acute phase of KD, synchronized with the transcriptional upregulation of the STAT3/JAK signaling axis. Animal experiments further demonstrated that pharmacological inhibition of this pathway substantially attenuated inflammatory infiltration in the cardiac vasculature of KD mice. ConclusionThis study identifies the STAT3/JAK-mediated Th17 differentiation bias as a potential regulatory program associated with acute inflammation in Kawasaki disease, thereby highlighting the STAT3/JAK axis as a potential therapeutic target.

8

Resolution of systemic inflammation in psoriasis following herring roe oil treatment: a post hoc analysis on inflammatory biomarkers in non-severe psoriatic patients

Ringheim-Bakka, T. A.; Gammelsaeter, R.; Tveit, K. S.

2026-04-22 dermatology 10.64898/2026.04.20.26350934 medRxiv

Top 0.1%

1.8%

Show abstract

BackgroundPsoriasis is a chronic immune-mediated inflammatory disease (IMID) with systemic involvement. In mild-to-moderate disease, circulating cytokines may inadequately capture systemic inflammatory burden. Composite haematological indices derived from complete blood counts, such as the systemic immune-inflammation index (SII) and systemic inflammation response index (SIRI), have emerged as sensitive prognostic markers of systemic inflammation, including in psoriasis. This exploratory post hoc analysis investigated the effects of orally administered herring roe oil (HRO), a phospholipid-rich marine oil, on systemic inflammation in patients with mild-to-moderate psoriasis utilizing these biomarkers. MethodsData were analysed from a randomized, double-blind, placebo-controlled 26-week clinical study which investigated HRO supplementation in patients (N = 64) with mild-to-moderate psoriasis (NCT03359577). SII, SIRI, neutrophil-to-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio (PLR), and monocyte-to-lymphocyte ratio (MLR) were calculated at baseline, week 12, and week 26 for patients where baseline complete blood counts (CBCs) were available (n = 60). Patients missing baseline CBCs were excluded from the analysis. Continuous changes were assessed using ANCOVA with baseline adjustment. Categorical responder analyses were performed with 25% and 30% reduction thresholds and stratification by baseline biomarker medians were performed to evaluate treatment responses and impact of baseline inflammation. ResultsCompared with placebo, HRO treatment resulted in significant mean reductions in SII, SIRI, and PLR at week 26, with supportive trends and responder effects observed as early as week 12 compared to placebo. Patients with elevated baseline inflammatory indices showed the greatest reductions in systemic inflammation. Stratification by baseline SII further revealed enhanced clinical benefit, with statistically significant PASI50 response rates in the HRO arm at week 26 among patients with lower baseline SII. ConclusionHRO supplementation was associated with a time{square}dependent reduction in systemic inflammatory biomarkers in mild{square}to{square}moderate psoriasis patients. These findings support the utility of composite inflammatory indices for monitoring systemic inflammation and suggest that baseline SII may have utility in predicting treatment response and may be a useful tool for stratification in clinical trials in mild to moderate psoriasis patients. These results could also suggest platform-potential of HRO for resolution{square}oriented interventions across several inflammatory conditions.

9

Genetic loss of JAK1 and cutaneous HPV infection

Fan, S.-Q.; Wang, R.-R.; Colombo, R.; Tang, K.-C.; Liu, J.-W.; Pontoglio, A.; Zhang, L.-L.; Li, K.; Han, S.-R.; Zhang, H.; Bai, X.; Yu, X.; Habulieti, X.; Liu, K.-Q.; Sun, Y.; Sun, L.-W.; Liu, H.; Sun, M.; Lin, Z.-M.; Zhang, F.-R.; Ma, D.-L.; Zhang, X.

2026-04-08 genetic and genomic medicine 10.64898/2026.04.03.26350014 medRxiv

Top 0.1%

1.7%

Show abstract

Background: Human papillomaviruses (HPVs) pose a severe threat to global public health by driving nonmelanoma skin cancer (NMSC) and cervical cancer, with NMSC being one of the most common cancers worldwide. Epidermodysplasia verruciformis (EV) is an inborn error of immunity characterized by an increased susceptibility to persistent infection of cutaneous HPV and a high risk of NMSC. The genetic basis remains unknown in many patients with EV. Methods: We collected four unrelated pedigrees with EV. Genetic analysis identified five variants in JAK1 encoding the Janus kinase 1. Ex vivo models and patient-derived tissue were employed to evaluate the functional effects of JAK1 variants and delineate the pathogenic mechanisms. Results: We identified different variants in JAK1 in four pedigrees with dominant EV. Genetic analysis revealed five novel variants in JAK1, three of which resulted in nonsense-mediated mRNA decay (NMD). Functional assays identified a decreased phosphorylation of the signal transducers and activators of transcription (STATs), impaired interferon responses, and defective T cell activation. Immune dysregulation in patients, characterized by a reduced CD4/CD8 T cell ratio, decreased CD8 naive T cell proportion, and accumulated memory T cells, implies impaired antiviral immunity against HPV. Conclusions: Our findings confirm that JAK1 loss-of-function (LOF) variants underlie susceptibility to cutaneous HPV infection. [Funded by the National Natural Science Foundation of China (81788101, 81230015, 82394420, and 82394423), the National Key Research and Development Program of China (2022YFC2703900), the CAMS Innovation Fund for Medical Sciences (2021-I2M-1-018), and the Regione Lombardia, Italy (Innovative Research Project 1137-2010)].

10

Exome Sequencing Identifies POPDC2 as a Candidate Gene for Familial Congenital Junctional Ectopic Tachycardia

Helm, B. M.; Swan, A. H.; Rinne, S.; Pfuhl, M.; De Martino, E.; Kean, A. C.; Decher, N.; Brand, T.

2026-05-17 cardiovascular medicine 10.64898/2026.05.12.26353039 medRxiv

Top 0.1%

1.7%

Show abstract

Background: Congenital junctional ectopic tachycardia (cJET) is a rare, potentially life-threatening arrhythmia suspicious for a genetic basis, yet its molecular underpinnings remain incompletely defined. The POPDC2 gene, involved in cardiac pacemaking and membrane trafficking of interacting ion channels, has not previously been conclusively linked to human tachyarrhythmias. This study investigates a novel POPDC2 variant (p.Leu245Pro) identified in a family with autosomal dominant cJET. Methods: Exome sequencing was performed to identify co-segregating variants in the affected family. Functional analysis of the POPDC2 p.Leu245Pro variant was conducted by molecular dynamics (MD) simulations, a membrane targeting assay, and a bimolecular fluorescence complementation assay. Additionally, the impact of the variant on Nav1.5 and TREK-1 currents was characterized in Xenopus oocytes. Results: The p.Leu245Pro POPDC2 variant showed a destabilization of the POPDC1-POPDC2 dimer interface, resulting in impaired heterodimer formation and membrane localization. Electrophysiological studies in Xenopus oocytes demonstrated that the mutant protein significantly affected Nav1.5 and TREK-1 currents. These findings support a functional impact of the POPDC2 p.Leu245Pro variant relevant to cardiac conduction. Conclusions: Our results provide the first functional evidence implicating POPDC2 in cJET and support its role as a novel candidate gene in tachyarrhythmic disease. This study enhances the understanding of genetic contributions to cJET and suggests further investigation of POPDC2 in other forms of supraventricular tachyarrhythmias.

11

Grading of Erythema and Visual Attributes in Atopic Dermatitis across Diverse Skin Tones Using a Vision AI Pipeline

Abdolahnejad, M.; Kyremeh, M.; Smith, J.; Fang, G.; Chan, H. O.; Joshi, R.; Hong, C.

2026-03-31 dermatology 10.64898/2026.03.30.26349755 medRxiv

Top 0.1%

1.7%

Show abstract

Background: Atopic dermatitis (AD) is a prevalent chronic inflammatory skin disease associated with clinical, psychosocial, and economic burden. Accurate severity assessment is essential for guiding treatment escalation and monitoring disease activity, yet clinician-based scoring systems such as the Eczema Area and Severity Index (EASI) are limited by subjectivity and considerable inter- and intra-rater variability. Erythema, a key driver of AD severity grading, is particularly prone to inconsistent evaluation due to differences in ambient lighting, device quality, skin tone, and rater experience, underscoring the need for objective, reproducible assessment tools. Objective: To develop and validate an artificial intelligence (AI) pipeline for grading erythema, excoriation, and lichenification severity in AD from clinical photographs. The study evaluated the level of agreement between AI severity ratings in each category against dermatologists, non-specialists, and a consensus reference standard, with erythema as the primary outcome of interest. Methods: A two-stage AI pipeline was developed using EfficientNet B7 convolutional neural networks (CNNs). The first CNN was trained as a binary AD classifier on 451 AD and 601 non-AD images for lesion detection and segmentation. The second CNN was trained on 173 dermatologist-annotated AD images which were scored on a 0-3 ordinal scale for erythema, excoriation, and lichenification. This CNN had a downstream feature extraction algorithms such red channel contrast for erythema, Law's E5L5 for excoriation, and S5L5 texture maps for lichenification. In a cross-sectional validation study, 41 independent test images were scored by two blinded dermatologists and two blinded physicians. AI predictions were compared to individual rater groups and mode-derived consensus scores using weighted Cohen's kappa, classification accuracy, confusion matrices, and error direction analyses. Results: On internal validation, the severity CNN achieved 84% overall accuracy (averaged across all three attributes), 86% sensitivity, 87% specificity, and a macro-averaged area under the receiver operating characteristic curve (AUC) of 0.90. In the external comparison with blinded human raters, erythema agreement between the AI and dermatologist consensus was substantial (accuracy 80.7%; kappa = 0.68), with no large (>2-point) misclassifications. Physician consensus agreement was lower (accuracy 54.8%; kappa = 0.34), reflecting greater variability among primary care physicians (non-specialists). For excoriation, AI-dermatologist agreement was moderate (accuracy 72.4%; kappa = 0.62); for lichenification, agreement was similar (accuracy 71.4%; kappa = 0.59). Across all features, disagreements were predominantly between adjacent severity categories. The AI was able to generate erythema severity grades for images of darker skin tones that dermatologists typically would not rate and were marked as "unable to assess". Limitations: The validation set was small (41 images), severe cases (score 3) were underrepresented, one rater participated in both training annotation and validation scoring, and sample size was insufficient for robust stratification by skin tone or body site. Conclusion: The AI pipeline demonstrated dermatologist-level accuracy for erythema scoring, consistent moderate agreement for excoriation and lichenification, and a potential advantage in assessing erythema on darker skin tones. These findings support its potential as a standardized, objective tool for AD severity assessment. Prospective validation in larger, more diverse cohorts is warranted.

12

Fine-scale population structure within and among Malagasy societies

Rakotoarivony, R.; Carter, E. J.; Racimo, F.; Regnier, D.; Ranaivoarisoa, J. F.; Shriver, M.; Perry, G.; Manica, A.; Hodgson, J. A.

2026-05-07 genetics 10.64898/2026.05.04.722645 medRxiv

Top 0.1%

1.5%

Show abstract

The population of Madagascar exhibits a globally unique combination of African and Asian genetic ancestries. Previous studies have described the admixture history of Madagascar at island-wide scales [1,2], but less focus has been paid to fine-scale population structure across the island. We present new genome-wide genetic data from 192 individuals sampled across five regions of Madagascar. We identify population structure at extremely fine spatial scales ([~]10 km) among the Merina of the central highlands. By analysing subpopulations separately, we found one Merina group exhibited similarity to coastal populations in f4 ratios, estimated admixture dates, and pairwise FST distances, while another group was similar to other highland individuals in the same measures. This fine-scale substructure is likely associated with historical coastal-to-highland migration during the 18th and 19th centuries. In contrast, we also observe macro-scale structure in estimated timing of admixture across the island, with southeastern coastal groups exhibiting the earliest estimated admixture timings, and northern groups exhibiting the latest. This pattern corroborates previous results [1,2], and may suggest differing histories of admixture timing among Malagasy populations. Our results emphasise the importance of deep micro-geographic sampling to complement macro-scale analysis when characterising demographic history.

13

Investigating the Y chromosome in complex disease: Phenome-wide scan across 104,334 Finnish men

Preussner, A.; Leinonen, J. T.; FinnGen, ; Pirinen, M.; Tukiainen, T.

2026-06-10 genetic and genomic medicine 10.64898/2026.06.09.26355235 medRxiv

Top 0.1%

1.5%

Show abstract

Although the Y chromosome represents roughly 2% of the male genome, it is often ignored in genome-wide association studies (GWAS). Subsequently, the potential health impacts of Y-chromosomal genetic variation remain incompletely understood. To fill this gap, we performed a phenome-wide association study (PheWAS) in FinnGen across 1,426 binary and quantitative traits using Y-chromosomal variation (frequency [≥] 1%) in 104,334 genotyped men. As Y chromosome variation is prone to population stratification, we performed carefully adjusted association analyses and further examined these through kin-based validation in 19,275 female and 24,712 male 1st degree relatives. We found 121 suggestive (p < 5.6x10-3) phenotypic associations in the Y chromosome, yet none of these were strong enough to reach phenome-wide significance (p < 3.9x10-6). While only 38 associations were supported in the kin-based validation, intriguingly we found support for a previously suggested link between haplogroup I1 and coronary heart disease (CHD; OR=1.06, 95%CI=1.02-1.11, p=3.7x10-3; male validation OR=1.05; female validation OR=0.97). The I1-CHD association was detected across distinct geographical areas within Finland and was independent from Loss of Y (LOY) and the autosomal risk to CHD, proposing a link between germline Y-chromosomal variation and heart disease risk. Overall, this study presents a comprehensive phenome-wide analysis of Y-chromosomal associations, highlighting the potential relevance of Y-chromosomal variation beyond sex determination. Our findings further emphasize the need for improved capture of Y-chromosomal variants and further analyses in biobank-scale data to allow for deeper exploration of male-specific genetic architecture of complex diseases.

14

Benchmarking sequence performance on the DNBSEQ-T7 using Genome in a Bottle reference genomes

van Coller, A.; Taukobong, S.; Malima, M.; Ghoor, S.; Nangammbi, N.; Roode, E.; Naicker, M.; Cole, V.; Glanzmann, B.; Kinnear, C.; Carstens, N.

2026-05-26 bioinformatics 10.64898/2026.05.22.727100 medRxiv

Top 0.2%

1.3%

Show abstract

Advances in sequencing technologies have improved the accuracy, throughput, and completeness of human genome characterization, enabling more reliable detection of genetic variation. Well-characterized reference genomes are critical for benchmarking sequencing platforms and bioinformatics analysis pipelines. Here, we present whole genome sequencing datasets generated for the Ashkenazi Jewish trio reference samples from the Genome in a Bottle Consortium. Libraries were prepared using three distinct MGI-based workflows: PCR-free library preparation, FastFS DNA library preparation, and Universal DNA library preparation. Sequencing was performed on the MGI DNBSEQ-T7 platform, generating a minimum of 400 million paired-end reads per sample, corresponding to 30X mean genome coverage. Raw reads were processed using a standardized GATK bioinformatics workflow. Sequencing performance and variant detection accuracy were evaluated using the Genome in a Bottle high-confidence benchmark variant sets. All workflows demonstrated high sequencing quality and concordance with GIAB benchmark truth sets, with PCR-free libraries showing the strongest indel calling performance and lowest Mendelian violation rates across the Ashkenazi trio. This dataset provides a resource for benchmarking DNBSEQ-T7 sequencing and bioinformatics workflows, and for evaluating the impact of library preparation strategies on whole genome variant detection performance.

15

Ancestry-stratified variant classification in monogenic diabetes genes: annotation coverage and differential curation burden

Dario, P.

2026-04-07 genetic and genomic medicine 10.64898/2026.04.06.26350230 medRxiv

Top 0.2%

1.3%

Show abstract

Variant databases ClinVar and gnomAD are the backbone of clinical variant interpretation, but their population composition is skewed toward European ancestry. Whether this skew creates systematic classification disadvantages for non-European patients with monogenic diabetes has not been examined at the database level. ClinVar variant_summary (GRCh38, April 2026; 4,421,188 variants) was cross-referenced with gnomAD v4.0 genome data for 17 monogenic diabetes genes. Annotation coverage and variant classification rates were computed stratified by genetic ancestry group (AFR, AMR, EAS, SAS, MID, NFE, FIN, ASJ). Of 14,691 gnomAD variants across the 17 genes, only 29.7% had any ClinVar classification (range: 12.7%-61.3% by gene). Among classified variants, non-Finnish European (NFE) variants had the highest variant of uncertain significance (VUS) rate (32.1%) and the lowest benign/likely benign fraction (41.6%), consistent with a large submission volume without functional follow-up. African-ancestry (AFR) variants showed the second-highest VUS rate (29.2%), not statistically distinguishable from NFE after Bonferroni correction, while all other non-European groups had significantly lower rates (all p < 0.001). GCK showed a pattern inversion - non-European VUS rate (18.5%) exceeding European (15.0%) - consistent with progressive reclassification in European populations absent in non-European cohorts. Annotation coverage and VUS divergence were uncorrelated (r = -0.15, p = 0.57). The primary equity problem is a 70% annotation gap combined with a non-European curation deficit, not a simple VUS excess. Ancestry-stratified evaluation of ClinGen Variant Curation Expert Panel (VCEP) criteria performance is warranted across disease domains.

16

Assessing the clinical significance of a novel rare variant in Loeys-Dietz Syndrome by combining AI-driven modelling and cell biology

Boukrout, N.; Delage, C.; Comptdaer, T.; Arondal, W.; Jemel, A.; Azabou, N.; Bousnina, M.; Mallouki, M.; Sabaouni, N.; Arbi, R.; Kchaou, S.; Ammar, H.; Hantous-Zannad, S.; Jilani, H.; Elaribi, Y.; Benjemaa, L.; Van der Hauwaert, C.; Larrue, R.; CHEOK, M.; Perrais, M.; Lefebvre, B.; Cauffiez, C.; Pottier, N.

2026-03-31 genetic and genomic medicine 10.64898/2026.03.30.26349510 medRxiv

Top 0.2%

1.3%

Show abstract

Loeys-Dietz syndrome (LDS) is an autosomal dominant connective-tissue disorder caused by genetic variants in TGF-{beta} pathway genes, most often TGFBR1/2. While pathogenic TGFBR2 genetic mutations usually cluster in the kinase domain and disrupt SMAD signalling, distinguishing with confidence those with functional impact on TGFBR2 function from rare benign genetic alterations represents one of the most important ongoing challenges for accurate genetic testing. Therefore, there is a pressing need to develop methods that can improve functional variant interpretation. Here, we describe and characterize the functional impact of a novel genetic variant in the TGFBR2 kinase domain (E431K), in a patient with the clinical diagnosis of syndromic genetic aortopathy. We assessed the structural and functional consequences of this variant using AI-driven molecular modelling and in vitro cell-based assays. A high-quality homology-based model of TGFBR2 was generated and computational mutagenesis based on the structural context and evolutionary conservation was used to forecast variant pathogenicity. Relative to wild type, the variant affects protein stability by disrupting intramolecular interactions and likely induces conformational changes that may affect kinase activity and thus TGF-{beta} signalling. This was experimentally confirmed by showing abnormal protein level and alteration of canonical TGF-{beta} pathway activation. Overall, our results establish that the E431K variant leads to aberrant TGF-{beta} signalling and confirm the diagnosis of Loeys-Dietz syndrome type 2 in this patient.

17

Evaluation of the Contribution of Natural Selection to Greater Cardiometabolic Disease Risk in South Asian Populations

Searby, D. J. C.; Hemani, G.; Chong, A.; Lawson, D. J.; Chaturvedi, N. J.; Davey Smith, G.

2026-05-22 genetic and genomic medicine 10.64898/2026.05.15.26353234 medRxiv

Top 0.2%

1.2%

Show abstract

A greater genetic susceptibility has been proposed as an explanation of the greater rates of cardiovascular and metabolic disease in South Asian relative to European populations. We first demonstrate that after accounting for technical artefacts the genetic effects for related traits are largely consistent between ancestral groups, which downplays the role of GxG or GxE interactions driving differential prevalence. If higher genetic susceptibility in South Asians is due to selective pressures acting through adiposity-related traits in the evolutionary past, signatures of selection should be evident at loci associated with cardiometabolic disease and other causally related traits (e.g. fat distribution). We tested for enrichment of several selection statistics (FST, XP-EHH and XP-nSL) at loci associated with a range of traits related to cardiometabolic disease, in comparison to a null distribution of linkage disequilibrium (LD) score and minor allele frequency (MAF) matched SNPs. Loci associated with a subset of these traits (Type 2 diabetes mellitus, trunk fat percentage, body fat percentage and trunk fat mass) exhibited enrichment for FST, consistent with a moderate adaptive explanation for their cross-population differentiation. In contrast, none of the studied traits were enriched for haplotype-based statistics, indicative that cross population genetic divergence is unlikely to have been driven by recent selective sweeps but has rather likely arisen from either ancient selection or recent polygenic selection acting on standing variation.

18

Somatic mutation landscape revealed by non-invasive iPSC derivation from urine cells

Bae, T.; Tomasini, L.; Klimczak, L. J.; Kayastha, M.; Suvakov, M.; Jang, Y.; Jourdon, A.; Gordenin, D. A.; Vaccarino, F. M.; Abyzov, A.

2026-04-14 genomics 10.64898/2026.04.11.717904 medRxiv

Top 0.2%

1.2%

Show abstract

Somatic mutations that arise post-zygotically create genetic diversity among normal human cells and provide key insights into human development and aging. Fibroblast-derived induced pluripotent stem cells (iPSCs) have proved to be a useful system for disease modelling; however, due to their clonal nature, iPSC lines carry somatic mutations inherited from the founder cells, raising concerns about their genomic integrity. At the same time, this clonality enables single-cell-level discovery of somatic mutations and the reconstruction of developmental lineages. In living individuals, though, this approach requires invasive biopsies and is limited to skin-derived lineages. Here, we generated 33 urine-derived iPSC lines from four males representing two father- son relationships, performed shallow whole-genome sequencing of the lines and analyzed somatic mutations. Derived iPSCs representing single cells from urine carried a few hundred of somatic single-nucleotide variants per genome, dominated by endogenous, clock-like mutational signatures and lacking environmental imprints such as UV-associated mutations. Copy-number analysis identified somatic CNVs in most of the lines and revealed higher CNV burdens in fathers than in sons, consistent with age-related structural mosaicism. Shared mutations across lines enabled reconstruction of cell lineage phylogenetic trees. In summary, urine-derived iPSCs showed genomic alterations comparable to those in fibroblast-derived iPSC lines and represent a valuable non-invasive alternative for disease modeling. Overall, this study provides the first genome-wide characterization of somatic mutations in urine-derived iPSCs and establishes them as a practical and non-invasive platform for charting somatic mutation landscapes and tracing developmental lineages in living humans.

19

UshEffect-3D: Structure-informed Classification of USH2A Missense Variants for Inherited Retinal Disease

Choudhary, D.; Portelli, S.; Ascher, D. B.

2026-04-27 bioinformatics 10.64898/2026.04.23.720479 medRxiv

Top 0.2%

1.2%

Show abstract

PurposeVariants of uncertain significance (VUS) in USH2A represent a critical interpretive challenge in inherited retinal disease, with over 70% of ClinVar submissions for this gene currently unresolved. We aimed to develop a gene-specific, structure-informed machine learning framework to improve the clinical classification of USH2A missense variant and provide a tractable tool to aid the diagnosis of Usher Syndrome II. MethodsA dataset of 545 curated USH2A missense variants with established clinical classifications was assembled from ClinVar and LOVD. AlphaFold2-predicted domain structures were used to generate local structural descriptors and biochemical features combined with sequence-based evolutionary conservation scores, yielding 153 candidate features reduced to nine via sequential feature selection. Eleven machine learning classifiers were trained using a 10-fold cross-validation strategy, then independently assessed on a blind test set and validated against 78 ACMG-classified pathogenic variants. Model predictions were benchmarked against five general-purpose variant effect predictors and applied to 2639 USH2A VUS from ClinVar. Feature contributions were analysed using SHAP analysis and ablation studies. ResultsThe Random Forest classifier achieved the highest performance on the blind test set, with an MCC of 0.87 and AUC of 0.97. On independent ACMG validation, sensitivity reached 0.73 with perfect precision. UshEffect-3D substantially outperformed all general-purpose predictors, including PolyPhen-2 (MCC = 0.61), AlphaMissense (MCC = 0.42), and ESM-1b (MCC = 0.32). SHAP analysis identified evolutionary conservation as a dominant predictor, with structural stability providing an independent but complementary signal. Applied to 2639 ClinVar VUS, the model prioritised 888 variants (33.6%) as likely pathogenic, particularly enriched within the Laminin N-terminal and Laminin G-like domains. ConclusionsUshEffect-3D demonstrates that gene-specific, structure-informed machine learning substantially outperforms general-purpose variant effect predictors for USH2A missense variant interpretation. This framework provides a high-confidence prioritization resource for the large unresolved VUS burden in this gene to facilitate earlier molecular resolution of USH2A-associated disease. As genedirected therapies for USH2A-associated retinal disease advance toward clinical application, accurate and interpretable variant classification will be essential for equitable patient selection. UshEffect-3D is freely accessible via an interactive web server.

20

Genomic Footprints of Bottlenecks, Isolation, and Inbreeding: A Case Study of Two Vulture Cohorts in India

Shukla, M.; Bohra, D. L.; Rao, B.; Narayan, L.; Kiran, S.; Thakur, V.

2026-05-05 genomics 10.64898/2026.04.30.721611 medRxiv

Top 0.2%

1.2%

Show abstract

Genomic erosion as a manifestation of small effective population size (Ne) and consanguinity subverts long-term perpetuation of threatened species by compromising their adaptive potential; however, the integration of genomics remains limited in applied conservation efforts to guide priorities. This study combines non-invasive sampling, double-digest Restriction site-associated DNA sequencing (ddRAD), and population-genomic analyses to assess genetic health in two vulture assemblages-mixed wild enclosure and captive breeding cohorts. Both the geographical locations exhibit signs of populations in distress: low genetic diversity and abundant intermediate-length runs of homozygosity (RoH), consistent with long-term reduced Ne plus recent demographic isolation. Our demographic model runs favoured ancient migration (AM) topology characterised by an ephemeral window of gene flow, taken over by a prolonged population separation period. The mutation quantification results from approximately 59,000 outgroup-polarised SNPs reveal higher additive burden and more homozygous-derived sites in BKN. However, this was later traced to low-impact and non-coding variants rather than a surge in the loss-of-function (LoF) alleles. The data support a genomic profile that carries an elevated risk from polygenic/aggregate deleterious burden in BKN despite a scarcity of high-impact mutations. By highlighting the disconnect between genetic resilience and demographic recovery, our results accentuate the need to incorporate genomics-informed inbreeding and monitoring programs, while also focusing on reducing anthropogenic mortality with genetic augmentation.